12 research outputs found

    Fast Algorithm and Implementation of Dissimilarity Self-Organizing Maps

    Get PDF
    In many real world applications, data cannot be accurately represented by vectors. In those situations, one possible solution is to rely on dissimilarity measures that enable sensible comparison between observations. Kohonen's Self-Organizing Map (SOM) has been adapted to data described only through their dissimilarity matrix. This algorithm provides both non linear projection and clustering of non vector data. Unfortunately, the algorithm suffers from a high cost that makes it quite difficult to use with voluminous data sets. In this paper, we propose a new algorithm that provides an important reduction of the theoretical cost of the dissimilarity SOM without changing its outcome (the results are exactly the same as the ones obtained with the original algorithm). Moreover, we introduce implementation methods that result in very short running times. Improvements deduced from the theoretical cost model are validated on simulated and real world data (a word list clustering problem). We also demonstrate that the proposed implementation methods reduce by a factor up to 3 the running time of the fast algorithm over a standard implementation

    Self-organizing maps and symbolic data

    Get PDF
    International audienceIn data analysis new forms of complex data have to be considered like for example (symbolic data, functional data, web data, trees, SQL query and multimedia data, ...). In this context classical data analysis for knowledge discovery based on calculating the center of gravity can not be used because input are not Rp\mathbb{R}^p vectors. In this paper, we present an application on real world symbolic data using the self-organizing map. To this end, we propose an extension of the self-organizing map that can handle symbolic data

    Une adaptation des cartes auto-organisatrices pour des données décrites par un tableau de dissimilarités

    Get PDF
    National audienceMany data analysis methods cannot be applied to data that are not represented by a fixed number of real values, whereas most of real world observations are not readily available in such a format. Vector based data analysis methods have therefore to be adapted in order to be used with non standard complex data. A flexible and general solution for this adaptation is to use a (dis)similarity measure. Indeed, thanks to expert knowledge on the studied data, it is generally possible to define a measure that can be used to make pairwise comparison between observations. General data analysis methods are then obtained by adapting existing methods to (dis)similarity matrices. In this article, we propose an adaptation of Kohonen's Self Organizing Map (SOM) to (dis)similarity data. The proposed algorithm is an adapted version of the vector based batch SOM. The method is validated on real world data: we provide an analysis of the usage patterns of the web site of the Institut National de Recherche en Informatique et Automatique, constructed thanks to web log mining method

    Extraction de données symboliques et cartes topologiques: application aux données ayant une structure complexe

    No full text
    The aim of symbolic data analysis is to provide a better representation of the variations and imprecision contained in real data. As such data express a higher level of knowledge; the representation must offer a richer formalism than that provided by classical data analysis. A generalization process exists that allows data to be synthesized and represented by means of an assertion formalism that was defined in symbolic data analysis. This generalization process is supervised and often sensitive to virtual and atypical individuals. When the data to be generalized is heterogeneous, some assertions include virtual individuals. Faced with this new formalism and the resulting semantic extension that symbolic data analysis offers, a new approach to processing and interpreting data is required. The original contributions of our work concern new approaches to representing and clustering complex data. First, we propose a decomposition step, based on a divisive clustering algorithm, that improves the generalization process while offering the symbolic formalism. We also propose a unsupervised generalization process based on the self-organizing map. The advantage of this method is that it enables the data to be reduced in an unsupervised way and allows the resulting homogeneous clusters to be represented by symbolic formalism. The second contribution of our work is a development of a clustering method to handle complex data. The method is an adaptation of the batch version of the self-organizing map to dissimilarity tables. Only the definition of an adequate dissimilarity is required for the method to operate efficiently.Un des objectifs de lanalyse de données symboliques est de permettre une meilleure modélisation des variations et des imprécisions des données réelles. Ces données expriment en effet, un niveau de connaissance plus élevé, la modélisation doit donc offrir un formalisme plus riche que dans le cadre de lanalyse de données classiques. Un ensemble dopérateurs de généralisation symbolique existent et permettent une synthèse et représentation des données par le formalisme des assertions, formalisme défini en analyse de données symboliques. Cette généralisation étant supervisée, est souvent sensible aux observations aberrantes. Lorsque les données que lon souhaite généraliser sont hétérogènes, certaines assertions incluent des observations virtuelles. Face à ce nouveau formalisme et donc cette extension dordre sémantique que lanalyse de données symbolique a apporté, une nouvelle approche de traitement et dinterprétation simpose. Notre objectif au cours de ce travail est daméliorer tout dabord cette généralisation et de proposer ensuite une méthode de traitement de ces données. Les contributions originales de cette thèse portent sur de nouvelles approches de représentation et de classification des données à structure complexe. Nous proposons donc une décomposition permettant daméliorer la généralisation tout en offrant le formalisme symbolique. Cette décomposition est basée sur un algorithme divisif de classification. Nous avons aussi proposé une méthode de généralisation symbolique non supervisée basée sur l'algorithme des cartes topologiques de Kohonen. L'avantage de cette méthode est de réduire les données d'une manière non supervisée et de modéliser les groupes homogènes obtenus par des données symboliques. Notre seconde contribution porte sur lélaboration dune méthode de classification traitant les données à structure complexe. Cette méthode est une adaptation de la version batch de lalgorithme des cartes topologiques de Kohonen aux tableaux de dissimilarités. En effet, seule la définition dune mesure de dissimilarité adéquate, est nécessaire pour le bon déroulement de la méthode
    corecore